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Abstract 

We  study  the  convergence  of  a  class  of  discrete-time  continuous-state  stimulated 
annealing  type  algorithms  for  multivariate  optimization.  The  general  algorithm  that  we 
consider  is  of  the  form  Xk+i  =  Xk  -  ak(VU(Xk)  +  ^k)  +  bkWk-  Here  U(*)  is  a  smooth 
function  on  a  compact  subset  of  IR^  {^kl  is  a  sequence  of  -  valued  random  variables, 
{Wk}  is  a  sequence  of  independent  standard  r-dimensional  Gaussian  random  variables,  and 
{ak}»  {bk}  are  sequences  of  positive  numbers  which  tend  to  zero.  These  algorithms  arise 
by  adding  slowly  decreasing  white  Gaussian  noise  to  gradient  descent,  random  search,  and 
stochastic  approximation  algorithms.  We  show  that  under  suitable  conditions  on  U(»), 
{^k},  {ak}  and  {bk}  that  Xk  converges  in  probabihty  to  the  set  of  global  minima  of  U(*). 
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1.  INTRODUCTION 

It  is  desired  to  select  a  parameter  value  x*  which  minimizes  a  smooth  function  U(x) 
over  xGD,  where  D  is  compact  subset  of  The  stochastic  descent  algorithm 

Zk+i  =  Zk  -  ak(VU(Zk)  +  4)  ,  (1.1) 

is  often  used  where  {^k}  is  a  sequence  of  IR’^  -  valued  random  variables  and  {ak}  is  a 
sequence  of  positive  numbers  with  ak— >0  and  algorithm  of  this  type 

might  arise  in  several  ways.  The  sequence  {Zk}  could  correspond  to  a  stochastic 
approximation  [1],  where  the  sequence  {^k}  arises  from  noisy  measurements  of  VU(*)  or 
U(»).  The  sequence  {Zk}  could  also  correspond  to  a  random  search  [2],  where  the 
sequence  {^k}  arises  from  randomly  selected  search  directions.  Now  since  D  is  compact 
it  is  necessary  to  insure  the  trajectories  of  {Zk}  are  bounded;  this  may  be  done  either  by 
projecting  Zk  back  into  D  if  it  ever  leaves  D,  or  by  fixing  the  dynamics  in  (1.1)  so  that 
Zk  never  leaves  D  or  only  leaves  D  finitely  many  times  w.p.l.  Let  S  be  the  set  of  local 
miuiTna.  of  U(»)  and  S*  the  set  of  global  minima  of  U(*).  Under  suitable  conditions  on 
U(»),  {^k}  2-nd  {ak},  and  assuming  that  {Zk}  is  bounded,  it  is  well-known  that  Zk— as 
k-^oo  w.p.l.  In  particular,  if  U(»)  is  well-behaved,  ak  =  A/k  for  k  large,  and  {j^k}  ^re 
independent  random  variables  such  that  E{  |^k  1^}  ^  cag  and  |E{^k}  1  ^  caf  where 
a  >  —1,  >  0,  and  c  is  a  positive  constant,  then  Zk— as  k — >-00  w.p.l.  However,  if 

U(*)  has  strictly  local  minima,  then  in  general  Zk-7^S*  as  k— >00  w.p.l. 

The  analysis  of  the  convergence  w.p.l  of  {Zk}  is  usually  based  on  the  convergence 
of  an  associated  ordinary  differential  equation  (ODE) 

z(t)  =  -  VU(z(t)). 

This  approach  was  pioneered  by  Ljung  [3]  and  further  developed  by  Kushner  and  Clark 
[4],  Metivier  and  Priouret  [5],  and  others.  Kushner  and  Clark  also  analyzed  the  conver¬ 
gence  in  probability  of  {Zk}  by  this  method.  However,  although  their  theory  yields 
much  useful  information  about  the  asymptotic  behavior  of  {Zk}  under  very  weak 
assumptions,  it  fails  to  obtain  Zk— as  k— ^00  in  probability  unless  S  is  a  singleton;  see 
[4,  p.  125]. 

Consider  a  modified  stochastic  descent  algorithm 
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Xk+i  =  Xk  -  ak(VU(Xk)  +  Ck)  +  bkWk  (1.2) 

where  {Wk}  is  a  sequence  of  independent  Gaussian  random  variables  with  zero-mean 
and  identity  covariance  matrix,  and  {bk}  is  a  sequence  of  positive  numbers  with  bk — K). 
The  bkWk  term  is  added  in  artificially  by  Monte  Carlo  simulation  so  that  {Xk}  can 
avoid  getting  trapped  in  a  strictly  local  minimum  of  U(*).  In  general  Xk-7^S*  as  k— »-oo 
w.p.l  (for  the  same  reasons  that  Zk-7^S*  as  k — kx)  w.p.l).  However,  under  suitable  con¬ 
ditions  on  U(*),  {Ck}j  {s-k}  {bk}>  assuming  that  {Xk}  is  bounded,  we  shall  show 

that  Xk— as  k — ►cxd  in  probability.  In  particular,  if  U(»)  is  well-behaved,  ak  =  A/k 
and  bk  =  B/k  log  log  k  for  k  large  where  B/A  >  Cq  (a  positive  constant  which  depends 
only  on  U(»)),  and  {^k}  independent  random  variables  such  that  E{  |^k  P}  —  cak 
and  |E{^k}  |  —  caf  where  a.  >  —1,  /?  >  0,  and  c  is  a  positive  constant,  then  Xk — >-S* 
as  k— »-oo  in  probability. 

Our  analysis  of  the  convergence  in  probability  of  {Xk}  is  based  on  the  convergence 
of  what  we  will  call  the  associated  stochastic  differential  equation  (SDE) 

dx(t)  =  —  VU(x(t))dt  +  c(t)dw(t)  (1.3) 

where  w(*)  is  a  standard  r-dimensional  Wiener  process  and  c(*)  is  a  positive  function 
with  c(t)— >0  as  t — >-oo  (take  tk  =  ~  c(tk)  to  see  the  relationship 

between  (1.2)  and  (1.3)).  The  simulation  of  the  Markov  diffusion  x(*)  for  the  purpose  of 
global  optimization  has  been  called  continuous  simulated  annealing.  In  this  context, 
U(x)  is  called  the  energy  of  state  x  and  T(t)  =  c^(t)/2  is  called  the  temperature  at  time 
t.  This  method  was  first  suggested  by  Grenender  [6]  and  Geman  and  Hwang  [7]  for 
image  processing  applications  with  continuous  grey  levels.  We  remark  that  the  discrete 
simulated  annealing  algorithm  for  combinatorial  optimization  based  on  simulating  a 
Metropolis- type  Markov  chain  [8],  and  the  continuous  simulated  annealing  algorithm  for 
multivariate  optimization  based  on  simulating  the  Langevin-type  Markov  diffusion  dis¬ 
cussed  above  both  have  a  (Gibbs)  invariant  distribution  ocexp(— U(x)/T)  when  the  tem¬ 
perature  is  fixed  at  T.  The  invariant  distributions  concentrate  on  the  global  minima  of 
U(»)  as  T— K).  The  discrete  and  continuous  algorithms  are  further  related  in  that  a  cer¬ 
tain  parametric  family  of  continuous  state  Metropolis-type  Markov  chains  interpolated 
into  continuous  time  Markov  processes  converge  to  a  Langevin-type  Markov  diffusion 
[9].  Now  the  asymptotic  behavior  of  x(*)  has  been  studied  intensively  by  a  number  of 
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researchers  [7],  [10]-[12].  Our  work  is  based  on  the  analysis  of  x(*)  developed  by  Chiang, 
Hwang  and  Sheu  [11]  who  prove  the  following  result;  if  U(*)  is  well-behaved  and 
(t)  =  C /log  t  for  t  large  where  C  >  Cq  (a  positive  constant  which  depends  only  on 
U(*)  and  the  same  Cq  as  above)  then  x(t)— ♦■S  as  t — ►oo  in  probability. 

The  actual  implementation  of  (1.3)  on  a  digital  computer  requires  some  type  of 
discretization  or  numerical  integration,  such  as  (1.2).  Aluffi-Pentini,  Parisi,  and  Zirilli 
[13]  describe  some  numerical  experiments  performed  with  (1.2)  for  a  variety  of  test 
problems.  Kushner  [12]  was  the  first  to  analyze  (1.2)  but  for  the  case  of 
ajc  =  bk  =  A/log  k,  k  large.  Although  Kushner  obtains  a  detailed  asymptotic  descrip¬ 
tion  of  {Xk}  for  this  case,  in  general  Xk-j^S*  as  k— *-00  in  probability  unless  ^k  =  0* 
The  reason  for  this  is  intuitively  clear:  even  if  {^k}  is  bounded,  ak^k  s,kWk  can  be 
of  the  same  order  and  hence  can  interfere  with  each  other.  On  the  other  hand  by  con¬ 
sidering  (1.2)  for  the  case  of  ak  =  A/k,  bl  ==B/k  log  log  k,  k  large,  we  get  Xk — >-S*  as 
k— »-oo  in  probability  for  {^k}  with  unbounded  variance,  in  particular  for 
E{|^k P}  =  C){k^)  s-iid  7  <  1.  Our  method  of  analysis  is  different  from  Kushner’s  in 
that  we  obtain  the  asymptotic  behavior  of  {Xk}  from  x(»). 
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2.  MAIN  RESULTS  AND  DISCUSSION 

O 

We  will  use  the  following  notation.  If  FClR^  then  F  is  the  interior  of  F  and  dF  is 
the  boundary  of  F.  1g(*)  is  the  indicator  function  for  the  set  G.  |*  |  and  <♦,*>  are 
the  Euclidean  norna  and  inner  product,  respectively. 

Our  analysis,  like  Kushner’s  [12],  requires  that  we  bound  the  trajectories  of  {Xk}. 
We  proceed  as  follows.  Take  D  to  be  a  closed  ball  in  centered  at  the  origin.  Let  Di 
be  another  closed  ball  in  centered  at  the  origin  with  DiCD  (strictly).  D\Di  will  be  a 
thin  annulus  where  we  modify  (1.2),  (1.3)  to  insure  that  {Xk}  and  x(*)  are  bounded. 
The  actual  algorithm  is 

Xk+,  =  Xk  -  at(VU(Xk)  +  4)  +  bkO(Xt)Wk 

^fc+l  =Xk+ilD(Xk-n)  +XkliRn^D(Xk4l),  {2.1) 

and  the  associated  SDE  is 

dx(t)  =  —  VU(x(t))dt  +  c(t)a(x(t))dw(t)  .  (2.2) 

We  will  make  assumptions  on  U(*)  and  a(*)  to  force  {Xk}  and  x(*)  to  eventually  stay  in 
D  when  they  start  in  D. 

In  the  sequel  we  make  the  following  assumptions: 

(Al)  U(»)  is  a  twice  continuously  differentiable  function  from  D  to  [0,oo)  with 

O 

min  U(x)  =  0  and  <  VU(x),x>  >  0  for  all  xED\Di. 

xSD 

(A2)  <j(*)  is  a  Lipshitz  continuous  function  from  D  to  [0,1]  with  (7(x)  >  0  for  all 

O 

xGD,  a(x)  =  1  for  all  xEDi,  and  cr(x)  =  0  for  all  xESD. 

(A3)  {^k}  is  a  sequence  of  IR^-valued  random  variables;  {Wk}  is  a  sequence  of 
independent  r-dimensional  Gaussian  random  variables  with  zero-mean  and 
identity  covariance  matrix. 

A  9  B 

(A4)  ai,  =  T—  ,  bk  =  k  large,  where  A,  B  >  0. 

^  ^  ^  k  ’  ^  k  log  log  k 

(A5)  c^(t)  =  — ^ — ,  t  large,  where  C  >  0. 
log  t 
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For  every  k=0,l,...  let  be  the  cr-field  generated  by 

{A6)  E{lfkF|Sfk}  =  0(aJ).E{&^k}  =  0(af)  and  |&  |lD\£,(Xi)-*0  as  k-^ 
uniformly  w.p.l;  is  independent  of  for  all  k. 

For  every  e  >  0  let 


7r*(x) 


exp 


2U(x) 


1d(x);  Z‘=/exp 
D 


2U(x) 


dx 


(A7)  Tf  has  a  unique  weak  limit  tt  as  e — K). 


A  few  remarks  about  these  assumptions  are  in  order.  First  it  is  clear  that  tt  con¬ 
centrates  on  S*,  the  global  minima  of  U(*).  The  existence  of  tt  and  a  simple  characteri¬ 
zation  in  terms  of  the  Hessian  of  U(*)  is  discussed  in  [14].  Also,  it  is  clear  that  the 
P  nt>  o{x(t)GD}  =  1  when  x(0)GD  and  it  can  be  shown  that  P  Un  ii{XkQ^}  =  1 
when  XqGD  and  a  >  — 1  (see  the  Remark  following  Proposition  1  in  Section  3).  Finally, 
we  point  out  that  a  penalty  function  can  be  added  to  U(*)  so  that  VU(»)  points  outward 

O 

in  the  annulus  D\Di  as  in  (Al).  However,  the  condition  that  ^k  tends  to  zero  in  the 

O 

annulus  D\Di  as  in  (A6)  can  be  a  significant  restriction. 

For  a  process  u(*)  and  function  f(»),  let  ^^{f(u(t))}  denote  conditional  expecta¬ 
tion  with  respect  to  u(ti)  =  Uj  and  let  denote  conditional  expectation 

with  respect  to  u(ti)  =  u^  and  u(t2)  =  U2.  Also  for  a  measure  ^(•)  and  a  function  f(*) 
let  /i(f)  =  f  M/ii. 

By  a  modification  of  the  main  result  of  [11]  we  have  that  there  exists  a  constant  Cq 
such  that  for  C  >  Cq  and  any  bounded  and  continuous  function  f(*)  on 

lim  Eo,x{f(x(t))}  =  7r(f)  (2.3) 

t~^oo 

uniformly  for  xED.  In  [11]  the  constant  Cq  is  denoted  by  Cq  and  has  an  interpretation 
in  terms  of  the  action  functional  for  the  dynamical  system  z(t)  =  — VU(z(t)).  Here  is 
our  theorem  on  the  convergence  of  {Xk}. 

Theorem:  Let  a  >  —1,  ^  >  0,  and  B/A  >  Cq.  Then  for  any  bounded  continuous 
function  f(*)  on  11“^ 


(2.4) 
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lim  E„,,{f(Xk)}  =  7i<f) 
k— >00 


uniformly  for  xGD. 

Since  tt  concentrates  on  S  ,  (2.3)  and  (2.4)  imply  x(t)— ►S  and  Xk— >-8  in  probabil¬ 
ity,  respectively. 

The  proof  of  the  theorem  requires  the  following  three  Lemmas.  Let  {tk}  and 
be  defined  by 

k-l 

~  Xj  ’  k  =  0,1,..., 

11=0 


^(s) 

/ 


s 


dll  = 

log  u 


s  >  1  . 


Lemma  1:  Let  a  >  —1,  /?  >  0,  and  B/A  =  C.  Then  there  exists  7  >  1  such  that 
for  any  bounded  continuous  function  f(*)  on 

sup  Eo,x;n,y  {f(Xk)}  -  Et^,y{f(x(tk))}  =  0 

E-+00  k  :  tn<tk:<7t„ 


uniformly  for  x,y€D. 

Lemma  2:  For  any  bounded  continuous  function  f(*)  on  IR’^ 

lim  sup  Et^,y{f(x(/?(s)))}  -  Es,y{f(x(/?(s)))}  =  0 

ll-->00  S  : 


uniformly  for  yGD. 

Lemma  S:  Let  C  >  Cq.  Then  for  any  bounded  continuous  function  f(*)  on  IR’^ 

lim  Es,y{f(x(/?(s)))}  -  7r"(®)(f)  =  0 

s~->oo 


uniformly  for  yGD. 

The  proofs  of  Lemmas  1  and  2  are  in  Section  3.  Lemma  3  is  a  modification  of 
results  in  [11,  Lemmas  2,  3].  Note  how  the  Lemmas  are  concerned  with  nonuniform 
approximation  on  intervals  of  increasing  length,  as  opposed  to  uniform  approximation 
on  intervals  of  fixed  length. 

We  now  show  how  the  Lemmas  may  be  combined  to  prove  the  Theorem. 
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Proof  of  Theorem:  Note  that  /?{s)  is  a-  strictly  increasing  function  and 
s  +  :<  l3{s)  <  s  +  for  s  large  enough.  Hence  for  k  large  enough  one  can 

choose  s  such  that  tj^  =  /3{s),  Clearly  s  <  tj^  and  s— »-oo  as  k — >-oo.  Furthermore  for  k 
and  hence  s  large  enough  one  can  choose  n  such  that  t^  :<  t^  <  Tt^  and 
tji  :<  s  :<  tn+i.  Clearly  n  <  k  and  n— »-cx)  as  k— >cx3.  Let 

p(0,x;n,A)  =  P{Xii6A.|Xo  =  x}.  We  can  write 

Eo,x{f(Xk)}  -  7r(f)  =  /  p(0,x;n,dy)  (Eo,x;E,y{f(Xk)}  -  7r(f)).  (2.5) 

D 

Now 

Eo,,;.,yWXk)}  -  lit)  =  E„,,;.,y{f(Xk))}  -  Et.,y{f(x{ti))} 

+  E^.,y{f(x(/3(s)))}  -  E„{f(x(/3(s)))} 

+  E„{f(x(/J(s)))}  -  7^W(t) 

+  7r^(®)(f)  -  7r(f)  0  as  k— -KX)  (2.6) 

uniformly  for  x,yGD  by  Lemmas  1-3  and  (A7).  Combining  (2.5)  and  (2.6)  completes  the 
proof. 

□ 

As  an  illustration  of  our  Theorem,  we  examine  the  random  directions  version  of 
(1.2)  that  was  implemented  in  [13].  If  we  could  make  noiseless  measurements  of 
VtJ(Xk)  then  we  could  use  the  algorithm 

Xk+i  =  Xk  -  ak  VU(Xk)  +  bkWk  (2.7) 

(modified  as  in  (2.1)).  Suppose  that  VU(Xk)  is  not  available  but  we  can  make  noiseless 
measurements  of  U(»).  Suppose  we  replace  VU(Xk)  in  (2.7)  by  a  forward  finite 
difference  approximation  of  VU(Xk),  which  would  require  r  -f  1  evaluations  of  U(*).  It 
can  be  shown  that  such  an  algorithm  can  be  written  in  the  form  of  (1.2)  with 
Ck  =  0{ck)  where  {ck}  are  the  finite  difference  intervals  (ck— >0).  As  an  alternative,  sup¬ 
pose  that  at  each  iteration  a  direction  dk  is  chosen  at  random  and  we  replace  VU(Xk)  in 
(2.7)  by  a  finite  difference  approximation  of  the  directional  derivative  <  VU(Xk),dk>dk 
in  the  direction  dk,  which  only  requires  2  evaluations  of  U(»).  Conceivably,  fewer 
evaluations  of  U(*)  would  be  required  by  such  a  random  directions  algorithm  to 
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converge.  Now  assume  that  the  {d^}  are  random  vectors  each  distributed  uniformly 
over  the  surface  of  the  r  —  1  dimensional  sphere  and  that  dj^  is  independent  of 
Xo,Wo,— Wic_i,do,—»dk_i.  By  analysis  similar  to  [4,  p.  58-60]  it  can  be  shown  that 
such  a  random  directions  algorithm  can  be  written  in  the  form  of  (1.2)  with 
E{^k  l^k}  =  0(ck)  and  —  0(1).  Hence  the  conditions  of  the  Theorem  will  be 
satisfied  and  convergence  will  be  obtained  provided  that  the  finite  difference  approxima¬ 
tion  of  VU(Xk)  is  used  in  the  thin  annulus  D\Di  and  Ck  =  0(k“'^)  for  some  /?  >  0. 

Our  Theorem,  like  Kushner’s  [12],  requires  that  the  trajectories  of  {Xk}  be 
bounded.  However,  there  is  a  version  of  Lemma  3  in  [11]  which  applies  with  D  — 
assuming  certain  growth  conditions  on  U(*).  We  are  currently  trying  to  obtain  versions 
of  Lemmas  1  and  2  which  also  hold  for  D  =  On  the  other  hand,  we  have  found 
that  bounding  the  trajectories  of  {Xk}  seems  useful  and  even  necessary  in  practice.  The 
reason  is  that  even  with  the  specified  growth  conditions  [Xk  |  tends  occasionally  to  very 
large  values  which  leads  to  numerical  problems  in  the  simulation. 

There  are  many  hard  multivariate  optimization  problems  where  the  simulated 
annealing  type  algorithms  discussed  in  this  paper  might  be  applied.  Recently  there  has 
been  alot  of  interest  in  learning  algorithms  for  artificial  neural  networks.  In  particular 
the  so-called  backpropagation  algorithm  has  emerged  as  a  popular  method  for  training 
multilayer  perceptron  networks  [15].  Backpropagation  is  a  stochastic  descent  algorithm 
and  as  such  is  subject  to  getting  trapped  in  local  minima.  It  would  be  interesting  to 
determine  whether  a  simulated  annealing  type  backpropagation  algorithm  where  slowly 
decreasing  noise  has  been  added  in  artificially  can  alleviate  this  problem. 


3.  PROOFS  OF  LEMMAS  1  and  2 


-  10  - 


Throughout  this  section  it  will  be  convenient  to  make  the  following  assumption  in 
place  of  (A5): 


(AS')  c^(tt)  = 


,  k  large,  where  C  >  0,  and  c^(*)  is  a  piecewise  linear 


k  log  log  k 
interpolation  of  {c^(tk)} 

Note  that  under  (Ah')  c^(t)  C/logt  as  t  — ►  oo,  and  if  B/A  =  C  then  bjc  =  '\/^c(tk) 
for  k  large  enough.  The  results  are  unchanged  whether  we  assume  (A5)  or  (A5').  We 
shall  also  assume  that  ak,bic  and  c(t)  are  all  bounded  above  by  1.  In  the  sequel 
Ci,C2,...,  will  denote  positive  constants  whose  value  may  change  from  proof  to  proof. 

We  start  with  several  Propositions. 

Proposition  1: 

P{Xk+i^D  l^k}  =  0(a|+“)  as  k oo, 

uniformly  w.p.l. 

Proof  :  Let  r^  =  V^,  k  =  0,1,...  .  We  can  then  write 

P{Xk+i€D|yt}.P{Xk+i^D,  |Wt|>  rkl^t} 


+  P{Xk+iS D,  |Wi  I  £  Tk  |^k}lB.(Xk) 

+  P{Xk+l^D,  |Wt|  <  rk  |^k}lD\5.(Xk) 

We  bound  each  term  on  the  r.h.s.  of  (3.1)  as  follows. 

First,  we  have 

P{ik+,^D,  |Wk|  >  Tkl^k} 


(3.1) 


p{  |Wk  I  S  tk}  ^  r  exp 


■li. 

2r 


=  o(a|"''“)  as  k  — >  oo. 


(3.2) 


Here  we  have  adapted  the  standard  estimate  Pr{7?  >  x}  <  — exp(— x^/2)  for  x  >  0, 

2 

where  ??  is  a  scalar  zero-mean  unit  variance  Gaussian  random  variable. 
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Next,  we  show  that 

P{Xk+i^D,  |Wt  I  <  Tk  |SFk}l£.(Xt)  =  0[Et")  as  k  oo.  (3.3) 

o 

Let  Xji-GDi.  Let  =  mfxgDi,y€SD  |x— y  |  >  0  and  0  <  £2  <  ^i-  Then 
P{Xk+i^D,  |Wk|  < 

<  P{  |-ak(VU(Xk)  +  4)  +  bkWk  |>ei,  |Wk  |  <  rk 

■cf  T>f  IC  i  \  las'  \  C-  l^k  I  l^k}  2+a\  1 

^  P{ak  14  I  >  62  l^k}  ^  - 2 - ""  0(ar  )  as  k  ->  00  . 

62 

The  second  inequality  follows  from  the  fact  that  bkrk— >D  as  k— +-oo,  and  the  third  ine¬ 
quality  is  Chebyshev’s.  This  proves  (3.3). 

Finally,  we  show  that 

P{Xi+i^D,  |Wk  I  <  Tk  |Si,}lD\£.(Xk)  =0  (3.4) 

o  _ 

for  k  large  enough.  Let  Xk€D\Di.  Let  Xk  =Xk  -h  bk<T(Xk)Wkl{  |:<rj.}.  Since  a{*)  is 

O 

Lipshitz,  <7(x)  >  0  for  all  xGD,  and  (7(x)  =  0  for  all  xG(9D,  we  have 

o(x)  :<  Ciinfyg^D  |x— y  |  for  all  xGD.  Hence  |Xk  —  Xk  |  ^  bkrkCiinfygap  |Xk— y  |,  and 

since  bkrk— ^  as  k— »-cx>  we  get  Xk  —  Xk  — >■  0  as  k— >-oo  and  Xk  G  D  for  k  large  enough. 

o 

Now  since  Xk  G  D\Di  we  have  <VU(Xk),Xk>  >  C2  and  ^k — ^  as  k — >-oo.  Hence 
<VU(Xk)  +  4,Xk>-<VU(Xk)  +  4,Xk>-^  as  k-^oo  and 

<  VU(Xk) -I- 4jXk>  >  C2  >  0  for  k  large  enough,  and  so 
<VU(Xk)  +  4>Xk>  >  C2  >0  ^or  k  large  enough,  and  consequently 

<ak(VU(Xt)+&),Xk>  ^  ^ 

|ak(VU(Xk)  +  &)  I  |Xk  I 

for  k  large  enough.  But  Xk+i  =  ^k  ~  ak(VU{Xk)  +  4)Q^  whenever  XkGD  and 
|ak(VU(Xk)  +  4)  I  <  C3  •  diam  D,  and  these  hold  for  k  large  enough.  This  proves 
(3.4).  Combining  (3.1)-(3.4)  to  completes  the  proof. 

□ 
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Remark:  By  Proposition  1  and  the  Borel-Cantelli  Lemma 

P  Un  nk>  n{Xk  G  D}  =  1  when  Xq  G  D  and  a  >  —1. 

Proposition  2:  For  each  n  let  {u^  i(;}k>  ^  be  a  sequence  of  nonnegative  numbers 
such  that 

Un,k+i  ^  (1  +  cakK_k  +  c&i,  k  >  n, 

=  C)(an)  as  n  — ►  oo, 

where  5  >  1,  e  >  0,  and  c  >  0.  Then  there  exists  a  7  >  1  such  that 

lim  sup  Un  k  =0. 

ii-*oo  k:ta:<ti.;<7ta  ’ 

Proof  :  We  may  set  c=l  since  ak  =  A/k  for  k  large  and  A  >  0  is  arbitrary.  Now 

]^— j  k“l  k 1 

^n,k  —  %,nn(l+^^)+  X)  n 


^=n 


k-1 


m=E  /=in+l 


k-1 


-  K.11  +  E  ^m)  *  exp(  X;  am), 

in=ii  m=n 

since  l+x:<  e’^  for  all  x.  Also  En“^am  :<  A(log(k/n)+l/n)  and 
Sn“^am  ^  A(l/(6^— l)n^“^  +  l/u^),  and  if  tk  Ttjj  then  k  :<  c^  n'’'.  Choose  7  such 
that  1  <  7  <  1  +  min{(5— l,e}/A.  It  follows  that 


sup  Un_k  ^  C2 

k:tn^tk:<7tn 


H — jzj"  >  0  as  n 


n  n 


00. 


□ 


Define  by 

x(t)  =  x(s)  -  (t-s)(VU(x{s))  +  ^(s,t))  +  c(s)a(x(s))(w(t)  -  w(s)) 
for  t  >  s  >  0. 

Proposition  S: 

E{6(t,t+h)  |x(t)}  =  0(h^/2), 

E{  |^(t,t+h)  P  |x(t)}  =  0(1), 
as  h— K),  uniformly  for  a.e.  x(t)Q)  and  all  t>  0. 
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Proof :  We  use  some  elementary  facts  about  stochastic  integrals  and  martingales 
(c.f.  [16]).  First  write 

t+h 

h^(t,t+h)  =  J  (VU{x(r))  —  VU(x{t))dT 

t 

t+h 

-  /  (c(7')<^x(r))  -  c(t)cr(x(t)))dw(r)  (3.5) 

t 

Now  a  standard  result  is  that 

E{  |x{t+h)  —  x(t)  P  lx(t)}  =  0(h) 

as  h  — ►  0,  uniformly  for  a.e.  x(t)  G  D  and  t  in  a  finite  interval.  In  fact,  under  our 
assumptions  the  estimate  is  uniform  here  for  a.e.  x(t)  G  D  and  all  t  >  0.  Let  Ki,K2  be 
Lipshitz  constants  for  VU(*),  (j(*),  respectively.  Also  note  that  c(*)  is  piecewise  continu¬ 
ously  differentiable  with  bounded  derivative  (where  it  exists)  and  hence  is  also  Lipshitz 
continuous,  say  with  constant  K3.  Hence 
t-Hh. 

E{  I  /  (VUWr))  -  VU(x(t)))dr  |x(t)} 

t 

t+h 

<  KfE{(  /  |x(r)  -x(t)  |dr)2  |x(t)} 

t 

t+h 

<  Kf  h  /  E{  |x(r)  -  x(t)  |2  |x(t)}dr  =  0(h3)  (3.6) 

t 

and 

t+h. 

E{  I  /  (c(r)(7(x(r))  -  c(t)(j(x(t)))dw(r)  P  |x(t)} 
t 

t+h 

=  /  E{  lc(r)a(x(r))  -  c(t)(7(x(t))  P  |x(t)}dr 

t 

t+h  t+h 

<  2K2  /  E{  lx(r)  —  x(t)  P  |x(t)}dr  +  2K3  f  (r— t)^dr  =  O(h^)  (3.7) 

t  t 

as  h— >0,  uniformly  for  a.e.  x(t)GD  and  all  t>  0.  The  Proposition  follows  easily  from 
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(3.5)-(3.7)  and  the  fact  that  the  second  (stochastic)  integral  in  (3.5)  defines  a  martingale 
as  h  varies. 

□ 

Now  in  Lemma  1  we  compare  the  distributions  of  and  x(t]c).  This  is  done  most 
easily  by  comparing  Xk  and  x(tk)  to  and  (defined  below),  respectively,  which  are 
equal  in  distribution. 

Let 

Yk+i  =  Yk  -  ak  VU(Yk)  +  bkC7(Yk)Wk 
Yk+i  =  Yk+ilD(Yk+i)  +  YkliR>^D(Yk+i) 

Lemma  1.1:  There  exists  7>1  such  that  for  any  bounded  and  continuous  func¬ 
tion  f(*)  on 

sup  Eo,x;n,y{f(Xk)}  -  E„,y{f(Yk)}  =  0, 

n-+oo  k:t„:<tk<7to 

uniformly  for  x,yGD 

Proof:  Let  x,yED,  n  a  positive  integer,  Xo  =x,  and  Xn=Yii=y.  Let 

Ak  =  Xk  —  Yk  for  k>  n.  We  suppress  the  dependence  of  Ak  on  x,  y  and  n.  Write 

E{|Ak+i  |^}  =  E{|Ak+i  Pl{Xj,+i^D}U{Yk+i^D}} 

+  E{  1  Ak+i  |^l{i^,6D} n {Yk+,eD}}  (3.8) 

We  estimate  the  first  term  in  (3.8)  as  follows.  We  have  by  Proposition  1  that 
E{|Ak+i  |^l{Xi.,«D)U{Yk.,«D>} 

<  Ci(P{Xt+,^  D}  +  P{Yk+i^  D})  =  0(a^“)  as  k  — oo,  (3.9) 

uniformly  for  x,yED. 

We  estimate  the  second  term  in  (3.8)  as  follows.  If  Xk+iED  and  Yk+iED  then 
Ak+,  =  At  -  at(VU(Yt+At)  -  VU(Yt)) 

+  bt{(7(Yt+At)  —  Cf(Yt))Wt  —  at^f 


Hence 
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E{|At+i  Pl{x„a>)n{Yk„eD)} 

<E{  |Ak  -  ai,(VU(Yk+Ak)  -  VU(Yt)) 

+  K(a(Yt+Ak)  -  o(Yk))Wk  -  P} 

<  E{  |At  P}  +  aiE{  iVU(Yt+Ak)  -  VU(Yi)  P} 

+  aiE{  |(<T(Yk+At)  -  ofYkjjWk  P} 

+  atE{  P} 

+  2at  |E{< A|„VU(Yi+Ak)  -  VU(Yt)>}  | 

+  2ai'/2  |E{<Ak,(o(Yt+Ak)  -  a(Yk))Wi>}  | 

+  2ak  |E{<Ak,^5c>}  I 

+  2a|/’  |E{<VU(Yt+Ak)  -  VU(Yk),(<3(Yi+At)  -  <r(Yi))Wk>}  | 

+  2a|  |E{<VU(Yk+Ak)  -  VU(Yk),  &>}  | 

+  2a^''"  |E{<(<3(Yk+Ak)  -  KYk))Wk,&>}  |,  (3.10) 

for  all  x,yGD,  k  >  n,  and  n  large  enough.  Let  Ki,K2  be  Lipshitz  constants  for 
VU(*),  ct(*),  respectively.  Using  the  facts  that  Xk,Yij-  and  hence  Ajj.  are  measurable, 
Wjj  is  independent  of  and 

|E{  Ifk  P  l^k}  ^  cjaf,  |E{fk  IS^k}  I  <  cjaf, 

w.p.l  for  all  x,yED,  k>  n,  and  n  large  enough,  we  have 
E{  |VU(Yi+Ak)  -  VU(Yk)  P}  <  K5E{  |Ai  P} 

E{  |(c<Yk+Ak)  -  o(Yk))Wk  P}  £  rKiE{  |Ak  P} 

E{  Ifk  P}  ^ 

|E{<Ak,VU(Yk+Ak)-VU(Yt)>}|  <  KiE{|Ak  P} 
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|E{<Ak,(<7(Yk+Ak)  -  a(Yk))Wt}  I 

=  |E{<At,WYi+Ak)-<7(Yt))E{Wt}>}|-0 
|B{<At,4>}|  =  |E{<Ak,E{ek  \9^}>  I 

<  B{  |Ai  I  |E{4  |Sft}  1}  <  Cja^E{  |Ai  |} 

|E{<VU(Yt+Ak)  -  VU(Yk),  {a(Yk+Ak)  -  a(Yt))Wt>}  | 

<  |E{<VU(Yk+At)  -  VU(Yt),(c7(Yk+Ak)  -  <7(Yk))E{Wk}>}  |  =  0 
|E{<VU(Yk+Ak)-VU(Yk),&>}|  =  |E{<VU(Yk+Ak)-W(Yk),E{&  l^k}>}l 

=  E{  |VU(Yk+Ak)  -  Va(Yk)  I  |B{4  l^k}  1}  ^  C2K,afE{  |Ak  |} 

|E{<(o(Yk+Ak)  -  a<Yk))Wk,4>}  I  =  |B{(<7(Yk+Ai)  -  <7(Yk))E{<Wt,4>  |%}  | 

<  E{  |o(Yk+Ak)-o(Yk)  |E{  |Wk  |"}»/“E{  |4  P  1}  ^  4A^Kja£/»E{  |Ak  1} 

for  all  x,yGD,  n,  and  n  large  enough,.  Substituting  these  expressions  into  (3.10) 
gives  (after  some  simplification) 

E{  |A](.+i  |^l{Xk+i€D}n  {Yk+iGD}}  ^  (H-C3ak)E{  |Ak  P}  4-  Csak  E{  jAk  |}  + 

—  (l+C3ak)E{  |Ak  P}  +  Csak  E{  |Ak  +  C2al'^°' 

—  (l+C4ak)E{  |Ak  P}  +  C4ak  ,  (3-11) 

for  all  x,yGD,  k  >  n,  and  n  large  enough,  where  =  min{H-/?,(3+a)/2}>l  and 
62  =  min{5i,2+Q;}>l  since  a>—l  and  /?>0. 

Now  combine  (3.8),  (3.9)  and  (3.11)  to  get 

E{  |Ak;+i  P}  <  (1  +  C5ak)E{  |A]c  1^}  +  cgafc  ,  k  >  n, 

e{|aJ2}  =  0, 

for  all  x,y€D  and  n  large  enough.  Applying  Proposition  2  there  exists  7>1  such  that 
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lim  sup  E{|A]j|^}  =  0,  (3.12) 

n-*-oo  k:to;<tk:£7t„ 

uniformly  for  all  x,yGD. 

Finally,  let  f(»)  be  a  bounded  continuous  function  on  K’'.  Since  D  is  compact  f(»)  is 
uniformly  continuous  on  D.  So  given  e>0  let  6>0  be  such  that  |f{u)— f(v)  |<€  when¬ 
ever  |u— V  |<5  and  u,vGD.  Then 

|Eo,,^.,7{f(Xk)}  -  E.,y{f(Yk)}  I  <  eP{  |Ai  I  <  «}  +  2||f|iP{ \A^  |>«} 


<  €  +  ^E{  |Ak  |2}, 

and  by  (3.12) 

V.  ,  lEo,.;n,j{f(Xk)}-E.,,,{f(Yi)}|  <  £, 

n-+oo  k:t„<tk;<7tn 

uniformly  for  x,yGD,  and  letting  e— >0  completes  the  proof. 

Let  Wk  =  (w(tk+i)-w(tk))/Vair  and 

Yk+i  =  Yk  -  akVU(Yk)  +  bka(Yk)Wk 


□ 


Yk+i  =  Yk+i  InCYk+i)  +  YkliRr\D(  Yk+i) 

Lemma  1.2:  There  exists  7>1  such  that  for  any  bounded  continuous  function  f(*) 
on 

lim  sup  Ej,,y{f(x(tk))}  -  En,y{f(Yk)}  =  0 

n-t-cx)  k:tn:^tt<7ta 

uniformly  for  yGD. 

Proof  :  Let  yGD,  n  be  a  positive  integer,  and  x(ti,)  =  Y^  =  y.  Define  {^k}  by 

x(tk+i)  =  x(tk)  -  ak(VU(x(tk))  +  4)  +  bkKx(tk))Wk,  k>  n. 

Let  ^k  be  the  cr-field  generated  by  {x(tii),^ii,...,^k-i>Wn,...,Wk_i}  for  k>  n.  It  can  be 
shown  that  £k  is  conditionally  independent  of  ^k  given  x(tk).  Hence  by  Proposition  3 
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E{  14  I'  ISf],}  <  c„  |E{4  |S«i}  I  <  Cla^/^ 

w.p.l  for  all  yED,  k>  n,  and  n  large  enough.  Let  Aj^  =x(tj(.)  —  Y]c  for  k>  n.  We 
suppress  the  dependence  of  on  y  and  n.  Similiarly  to  the  proof  of  Lemma  1,1  we 
can  show  with  5  =  3/2  that 

|A]j+i  I^}  :<  (H-C2aij)E{  |Aic  P}  +  C2a|,  k>  n, 


B{|A.  |2}  =0, 

for  all  y€D  and  n  large  enough.  Applying  Proposition  2  there  exists  a  7>1  such  that 

lim  sup  E{|A]jP}  =  0, 

H-+00  k:tn^k;^7tn 


uniformly  for  y€D.  The  Lemma  now  follows  as  in  the  proof  of  Lemma  1.1. 


□ 


Proof  of  Lemma  1:  Follow  immediately  from  Lemmas  1.1  and  1.2. 

□ 

Proof  of  Lemma  2:  Let  yED,  n  a  positive  integer,  and  sE[tii,tji4.i].  Let  x(*;s,y) 
denote  the  process  xp)  emitted  from  y  at  time  s.  Let  v(»)  be  a  standard  r-dimensional 
Wiener  process  starting  at  time  tn  and  independent  of  x(s;tn,y).  Define  Xi(»),  i  =  1,2, 
hy 


dxj(t)  =  —  VU(xi(t))dt  +  c(t)(j(xj(t))dv(t),  t>  s. 


Xl(s)  =x(s;tn,y). 


X2{s)  =y. 

Let  Vjc  =  (v(tk+i)-v(tk))/'\/^'  for  k>n,  and  =  (v(tn+i )-v(s)) /V t^+i -  s ,  Define 
{^i,k}»  i  =  I52,  by 

Xi(tk+i)  =  Xi(tk)  -  ak(VU(xi(tk))  +  +  bkCT(xi(tk))Vk,  k>n, 

Xi(Wi)  =Xi(s)  -  (t^+i-  s)(VU(xi(s))  +  +  Vtn+i -  s  c(s)a(xi (s))Vn . 

Let  ^j_k  be  the  cr-field  generated  by  {xi(s),^j^n,  .  .  .  ,Ci,k-i>Vn,  .  .  .  ,Vk_i}  for  k>  n.  It 
can  be  shown  that  ^i_k  is  conditionally  independent  of  V  ^2,k  given  Xi(tk).  Hence 
by  Proposition  3 
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E{  |^l,k  +  ^2,k  1^  1^1, kV  ^2,k}l  ^  Cl)  |E{Ci,k+^2,k  1^1, k  V  ^2.k}l  ^  Cia|^/^ 

w.p.l  for  all  yQD,  sG[tn,tji+i],  k>:  n,  and  n  large  enough. 

Now  observe  that 

E{  |x(t+h)  —  x(t)  p  |x{t)}  =  0(h)  as  h— 

uniformly  for  a.e.  x(t)GD  and  all  t>  0  (this  is  a  standard  result  expect  for  the  unifor¬ 
mity  for  all  t  which  was  remarked  on  in  Proposition  3).  Hence 

E{  |xi(s)  -  X2(s)  P}  =  E{  |x(s;ti,,y)  -  y  |2}  <  cga^, 

for  all  yGD,  sG[tii,tn+i],  and  n  large  enough.  Let  =Xi(tk+i)  —  X2(tk+i)  for  k>  n. 
We  suppress  the  dependence  of  on  y,  s  and  n.  Similiarly  to  the  proof  of  Lemma  1.1 
we  can  show  with  ^  =  3/2  that 

EIIAjj+i  P}  <  (l-l-C3ak)E{  |Ak  P}  +  C3a|,  k  >  n, 


E{  I  An  P}  <  (l-l-C3an)E{  |xi(s)-X2(s)  P}  -1-  C3an  <  C4an, 
for  all  yGD,  sG[tn,tn4.i],  and  n  large  enough.  Hence 

.  -  (l+<=3ak)  sup  E{  [Ak  P}  -b  cga^, 

for  all  yGD  and  n  large  enough,  and 


sup  E{  |An  P}  =  0(an)  as  n— »-oo, 

uniformly  for  yGD.  Applying  Proposition  2  there  exists  7>1  such  that 

lim  sup  sup  E{|AkP}=0, 

I1-.00  k:t,<k<7t„s:t,<s<U,  H  k.j 


k>  n. 


(3.13) 


uniformly  for  yGD. 

Note  that  /?(s)  is  a  strictly  increasing  function  of  s  and  s-l-s^'^^  <  0{s)  <  s-|-2s^/^  for 
s  large  enough.  Hence  for  n  large  enough  one  can  choose  s  such  that  tj,  <  s  :<  tn+i  and 
m  such  that  t^  <  ^{s)  <  tm+i  and  t^  t^  :<  Tt^.  As  above  we  can  show 

E  {  |xi(/?(s))-X2(^(s))  i^}  <  (l-t-C3am)E{  Ia^  P}  +  C3a^ 
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-  E{1Ai,  P} +  C3ai,  (3.14) 

k:t„<tk:<7tn 

for  all  yGD,  sG[tn,tn+i],  and  n  large  enough.  Combining  (3.13),  (3.14)  gives 

lim  sup  E{  |xi  (^(s))  -  X2  (/?(s))  P  }  =  0, 

n-»oo  s;tn:<s<tn+i 

uniformly  for  yGD.  Finally  since  Xi(/?(s)),  X2(/0(s))  are  equal  in  distribution  to 
x(/5(s);tii,y),  x(/?(s);s,y),  respectively,  the  Lemma  now  follows  as  in  the  proof  of  Lemma 

1.1. 

□ 
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